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Abstract 

Channel polarization, originally proposed for binary-input channels, is generalized to arbitrary discrete memoryless channels. 
Specifically, it is shown that when the input alphabet size is a prime number, a similar construction to that for the binary case 
leads to polarization. This method can be extended to channels of composite input alphabet sizes by decomposing such channels 
into a set of channels with prime input alphabet sizes. It is also shown that all discrete memoryless channels can be polarized by 
^\ . randomized constructions. The introduction of randomness does not change the order of complexity of polar code construction, 

' encoding, and decoding. A previous result on the error probability behavior of polar codes is also extended to the case of arbitrary 

' discrete memoryless channels. The generalization of polarization to channels with arbitrary finite input alphabet sizes leads to 

, polar-coding methods for approaching the true (as opposed to symmetric) channel capacity of arbitrary channels with discrete or 

^jQi continuous input alphabets. 
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I. Polarization 

Channel polarization was introduced in |[T] for binary input discrete memoryless channels as a coding technique to construct 
^ codes — called polar codes — for data transmission. Polar codes are capable of achieving the 'symmetric capacity' of any 
O binary input channel, using low-complexity encoding and decoding algorithms. In terms of the block-length N, polar codes 
can be encoded and decoded in complexity 0{N log A^) and achieve a block error probability that decays roughly like 2^^. 
The latter result was shown in |2|. 
J> ■ The aim of this note is to extend these results of ||T|, ||2l to DMCs with q-ary inputs for any finite integer g > 2. To that 
end, we recall the polarization construction and outline how the results above were shown. 
Given a binary input channel W : X ^ y with A" = {0, 1} define its symmetric capacity as 



0^ ■ I[W) is nothing but the mutual information developed between the input and the output of the channel when the input is 
■ uniformly distributed. In HI, two independent copies of W are first combined and then split so as to obtain two unequal binary 
J> input channels and W'^ . The channel W"^ : describes two uses of the channel W, 

X ■ W^{yi,y2\xi,X2) = W{y,\xi)W{y2\x2). 

Ci I The input {xi,X2) to the channel W'^ are put in one-to-one correspondence with (mi,U2) € X^ via xi = {ui + U2) mod 2, 
X2 = U2, thus obtaining the combined channel W2 ■ X^ y^ described by 

W2{yi,yi\ui,U2) = W'^{yi,y2\ui +^2,^2) = W{yi\ui + U2)W {y2\u2) . 

The split is inspired by the chain rule of mutual information: Let ?7i, C/2, Xi, X2, Yi, I2 be random variables corresponding 
to their lowercase versions above. If J7i, C/2 are independent and uniformly distributed, then so are Xi,X2 and consequently, 
on the one hand. 



I{UuU2;Y,,Y2) = I{X,,X2;Y,,Y2) = /(Xi; n) + /(X2; ^2) - 21 (W), 



and on the other 



/([/i, C/2; 1^2) = I{Ui;Yi,Y2) + I{U2;Yi,Y2,Ui). 
The spht channels and describe those that occur on the right hand side of the equation above: 

W~{yi,y2\ui) ^ ^ \W{yi\ui+U2)W{y2\u2), 

W'^(yi,2/2,ui|w2) = \W{yi\ui +U2)W{y2\u2), 
so that /(i7i; Fi, F2) = I{W-) and I{U2] Y1.Y2, Vi) = I{W+). 
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The polarization construction given in [T| is obtained by a repeated application of W i-^ {W , W^). Since both W and 
W+ are binary input channels, one can obtain := [W^)", W'+ := iW~)+, := {W+y , and W++ := {W+)+ . 

After n levels of application, one obtains 2" channels W , • • ■ , W'^ ^ . The main observation in UJ is that these channels 
polarize in the following sense: 

Proposition 1 (|1]). For any J > 0, 

In other words, except for a vanishing fraction, all the channels obtained at level n are either almost perfect, I{W^) > 1 — (5, 
or almost pure noise, liW^) < S. 

As the equaUty I{W~) + I{W+) = 2I{W) leads by induction to I]se{+ -}" = 2"7(T/F), one then concludes that 

the fraction of almost perfect channels approaches the symmetric capacity. This last observation is the basis of what lets liTI 
conclude that polar codes achieve the symmetric capacity. 

We give here a new proof of this proposition because it will readily generalize to the g-ary input case we will discuss later 
Before we embark on this proof, we introduce the Bhattacharyya parameter for a binary input channel W : X ^ y, defined 

by 

^(^) = ^/W{y\0)W{y\l). (3) 

y 

The relationship between Z{W), Z{W^), Z{W^) and I{W) is already discussed in HI, where the following is shown: 

Lemma 1 (H)). 

(i) Z{W+) = Z{Wf, 

(ii) Z{W ) < 2Z{W) - ziy^f, 

(iii) liy^) + Z{W) > 1, 

(iv) i{w)^ + ziyvf < 1. 

Proposition [T] was proved in |[T1 for the binary case (q = 2) using Lemma [ij Unfortunately, Lemma [T] does not generalize to 
the non-binary case (q > 3). The following alternate proof of Proposition [T] uses less stringent conditions that can be fulfilled 
for all q>2. 

Lemma 2. Suppose Bi, z = 1, 2, . . . are i.i.d., {+, —^-valued random variables with 

P{Bi = -) = = +) = i 

defined on a probability space (fl,J-,P). Set J-q — {(/i, 51} as the trivial a-algebra and set Tn, n > 1 to be the a-field 
generated by {Bi, . . . , Bn). 

Suppose further that two stochastic processes {/„ : n > 0} and {r„ : n > 0} are defined on this probability space with the 
following properties: 

(1.1) /„ takes values in the interval [0,1] and is measurable with respect to Tn- That is, Iq is a constant, and /„ is a 
function of Bi, . . . , Bn. 

(1.2) {(/„, : n > 0} is a martingale. 

(t.l) Tn takes values in the interval [0, 1] and is measurable with respect to Tn. 

(t.2) r„+i = T,? when Bn+i = +• 
(i&t.l) For any e > there exists d > such that In G {e,l — e) implies Tn G ((5, 1 — 6). 
Then, lao '■= lim„^oo In exists with probability 1, loo takes values in {0, 1}, and P{Ioc = 1) = ^o- 

Proof: The almost sure convergence of /„ to a limit follows from {/„} being a bounded martingale. Once it is known 
that loo is {0, l}-valued it will then follow from the martingale property that P{Ioo = 1) = ^'[^oo] = ^o- It thus remains to 
prove that loo is {0, l}-valued. This, in turn, is equivalent to showing that for any > 0, 

P{loo e (?7, 1 - ri)) = 0. 

Since for any < e < yy, the event {/qo G (77, 1 — 77)} is included in the event 

Je := \lo: there exists m such that for all n > m, /„ G (e, 1 — e)}, 

and since by property (i&t.l) there exists (5 > such that C Kg where 

Ks :~ {ljo : there exists m such that for all n > m, T„ G ((5, 1 — (5)}, 

it suffices to prove that P{Ks) ~ for any (5 > 0. This is trivially true for 6 > 1/2. Therefore, it suffices to show the claim 
for < (5 < 1/2. Given such a 5, find a positive integer k for which (1 — 5)^ < S. This choice of k guarantees that if a 
number a; G [0,1 — 5] is squared k times in a row, the result lies in [0, S). 
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For n> I define as the event that _B„ = i?n+i = • • • = Bn^^-i = +, i-e.. En is the event that there are k consecutive 
+'s in the sequence {Bi : i > 1} starting at index n. Note that P{En) — 2"^ > 0, and that {Emk : to > 1} is a collection of 
independent events. The Borel-Cantelli lemma thus lets us conclude that the event 

E = {En occurs infinitely often} 

= {lli : for every m there exists n > rn such that uj E En} 

has probability 1, and thus P{Ks) = P{Ks H E). We will now show that Kg C^ E is, empty, from which it will follow that 
P{Ks) = 0. To that end, suppose u e KgOE. Since ui e Ks, there exists m such that Tn{Lo) G ((5, 1 — 5) whenever n > m. But 
since u G E there exists uq > m such that Bno+i = ■ ■ ■ = Bn„+k-i = +, and thus Tng+ki^) — Tnoi^)'^ < (1 — S)"^ < S 
which contradicts with Tn„+k{t^) € (<^: 1 — S). ■ 

Remark 1. The proof of Lemma^ uses property (t.2j only in the way that repeated squarings of a number in ((5, 1 — 5) will 
eventually fall outside (6, 1 — 6). Thus, condition (t.2) may be replaced by any other that has this property. E.g., conditioned 
on Tn, at least one of the two values of Tn+i satisfies 

Tn+l < f{Tn) 

for a nondecreasing f having the property that for any 6 > 0, there exists k such that f^^\\ — 5) < 5. Here f'^^^ denotes 
k-fold composition of f. 

Proof of Proposition \J} Let Bi,B2,... be i.i.d., {+,— }-valued random variables taking the two values with equal 
probability, as in Lemma |2] Define 

/„ ■.= IniBi,...,Bn)=I{W'''-'^") 

and 

Tn :=T„(Bi,...,B„) =Z(VK^i--^"). 

These processes satisfy the conditions of Lemma|2l (i.l) is trivially true with Iq ~ I{W); the martingale property (i.2) follows 
from I{W~) + I{W'^) — 2I{W); (t.l) is again trivially true; (t.2) follows from Lemma[ni); (i&t.l) follows from Lemma[niii) 
and (iv). 

Thus, the process converges with probability 1 to a {0, l}-valued random variable. This imphes that 

lim P{In e (<5, 1 - S)) = 0. 

n^oo 

Note that the distribution of (^i, . . . , Bn) is the uniform distribution on {+, — Thus, 

and Proposition [U follows. ■ 
The following lemma was proved in 

Lemma 3 (|2|). Suppose that the processes {Bn}, {In} ond {Tn}, in addition to the conditions (i.l), (i.2), (t.l), (t.2) and 
(i&t.l) in Lemma\2\ also satisfy 

(t.3) For some constant k, Tn+i < kT„ when Bn+i = 
(i&t.2) For any e > there exists 6 > such that /„ > 1 — 5 implies Tn < e. 
Then, for any < P < 1/2 

lim P{Tn<2-^'") = lo. (4) 

n — >oo 

Note that in the proof of Proposition [T| the random variable Tn denotes the Bhattacharyya parameter of a randomly chosen 
channel after n steps of polarization. Therefore, Lemma [3] states that after n steps of polarization, almost all 'good' channels 
will have Bhattacharyya parameters that are smaller than 2^^ for any /3 < 1/2, provided that n is sufficiently large. Since 
the Bhattacharyya parameter is an upper bound to the error probability of uncoded transmission, this implies that , at any fixed 
coding rate below Iq = I{W), the block error probability P^ of binary polar codes under successive cancellation decoding 
will satisfy 

Pe < 2^^" for all /3 < 1/2, (5) 
when the block-length N = 2" is sufficiently large. 
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II. Polarization for g-ARY input channels 

In this section we will show how the transformation (ui,U2) > {xi,X2) (and consequently W i— > {W~ ,W^)) and the 
definition of Z{W) can be modified so that the hypotheses of Lemmas |2] and |3] are satisfied when the channel input alphabet 
is not binary. This will establish that the new transformation satisfies equation (|2]i, leading to the conclusion that g-ary polar 
codes achieve symmetric capacity. That the error probability behaves roug hly like will also follow. 

To that end, let q denote the cardinality of the channel input alphabet X and define 



W{y\x) 



as the symmetric capacity of a channel W . We will take the base of the logarithm in this mutual information equal to the 
input alphabet size q, so that < I{W) < 1. 

For any pair of input letters x, x' E X, we define the Bhattacharyya distance between them as 

= E ^W{y\x)W{y\x'). (6) 

Here, the notation W^j. ^'} should be interpreted as denoting the channel obtained by restricting the input alphabet of W to 
the subset {x,x'} C X. We also define the average Bhattacharyya distance of W as 



(7) 

(71 (7 — I I ... 

The average Bhattacharyya distance upper bounds the error probability of uncoded transmission: 

Proposition 2. Given a q-ary input channel W, let denote the error probability of the maximum-likelihood decoder for a 
single channel use. Then, 

Pe<{q- l)Z{W). 

Proof: Let P^^x denote the error probability of maximum-likelihood decoding when x e A" is sent. We have, 
Pe,x < PijJ '■ W{y I x') > W{y I x) for some x' ^ x \ x is sent) 

= ^ iy(y|x)<^ E ^(yl^)^^E E Vl^(y I x)W{y I X'). 

y.Bx'^x V x' : x'^x V x' : x'^x 

W{y\x')>W{y\x) W{y\x')>W{y\x) 

Therefore the average error probability is bounded as 

Pe = i E ^ ^ E E E Vw%i^m^= (g- 

^ xex ^ xex x'i^x y 

■ 

Proposition 3. We have the following relationships between /(W^) and Z{W). 



I{W) < log(g/2) + (log 2) ^l-Z{Wr (9) 



I{W) < 2{q - l)(loge)\/l - Z{W)'^. (10) 

Proof is given in the Appendix. 



A. Special case: Prime input alphabet sizes 

We will see that when the input alphabet size q is a prime number, polarization can be achieved by similar constructions to 
the one for the binary case. For this purpose, we will equip the input alphabet X with an operation '+' so that {X, +) forms a 
group. (This is possible whether or not q is prime.) We will let denote the identity element of {X, +). In particular, we may 
assume that X ~ {0, . . . ,q — 1} and that '+' denotes modulo-g addition. Note that when q is prime, this is the only group of 
order q. 

As in the binary case, we combine two independent copies of W, by choosing the input to each copy as 



Xi = Ui + U2, 
X2 = U2. 



(11) 
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We define the channels W and through 

W~{yi,y2 I wi) = ^ -1^2(2/1, y2 I ui,U2) 

U2ex ^ (j2) 
1 

W~^{yi,y2,ui I U2) = -W2{yi,y2 \ "1,^2), 

where again 14^2(2/1,^2 | ui,U2) = M^(2/i | ui +U2)W{y2 \ W2)- 
The main resuh of this section is the following: 

Theorem 1. The transformation described in ( llll l and il2i polarizes all q-ary input channels in the sense of Proposition |7] 
provided that q is a prime number The rate of polarization under this transformation is the same as in the binary case, in the 
sense that the block error probabilities of polar codes based on this transformation satisfy (|5]l. 

To prove Theorem [T] we first rewrite Z{W) as 

a — 1 

where we define 

q 



'1 'd^O 



ZdiW) = i ^ ^(W^{.,.+d}), d^O. 



We also define 

^max(W) =maxZd(Ty). 

We will use the following lemma in the proof. 

Lemma 4. Given a channel W whose input alphabet size q is prime, if Z„^x{W) > 1 — d, then Z{W) > 1 — q{q — 1)^6 for 
all d>0. 

Proof: Let d be such that Zm^^{W) = Zd{W), and note that Zd(W) >1- 6 implies 

1 - Z{W{^^^+d}) < qS for all xe X. 

For a given x € X define 



ay = ^Wiy I x) - ^W{y \x + d), 



by = ^Wiy \x + d)~- ^JW{y I x + d + d). 
for all y £ y. The triangle inequality states that 



/ \l/2 / \l/2 / xl/2 



or equivalently, that 

y^l - Z{W[,^,+d+d}) < \/l - Z{W[ 

x,x-\-d} ) + V 1 ^ Z (W{.j:+d,x+d+d}) 

< 2^5. 

On the other hand, since q is prime, the input alphabet can be written as 

X — {x, x + d, x + d + d, ...,a; + d + — ■ + d } 

q— 1 times 

for any d ^ and x £ X. Hence, applying inequality (fT3T l repeatedly yields 

^l~Z{W^) <{q- 1)^5 

for all x, x' E X, which implies 

Z{W) = —^ J2 Z{W{,^,,})>l-q{q-l)H. 



(13) 
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Proof of TheoremU]; The proof is similar to the one for the binary case: Let Bi, B2, ... be i.i.d. {+, — }-valued random 
variables taking the two values with equal probability. Define the random processes 

/„ :=/„(Bi,...,B„) =/(VK^^-'^") 

and 

Tn :-T„(Bi,...,S„) = Z„,,(W^^i--^"), 

with Iq — I{W) and Tq ~ Zmn^{W). It suffices to show that {/„} and {Tn} satisfy the conditions of Lemmas |2] and [3] 
Conditions (i.l), (i.2), and (t.l) hold trivially. Also, by (|8]l and ( fTOl i in Proposition [3] for any e > there exists 6 > such that 

I(W) e (e, 1 - e) implies Z{W) G ((5, 1 - d). 

Furthermore, it follows from Lemma |4] that for any S > 

Z{W) e ((5, 1 - 6) implies Z^,,{W) e ((5, 1 - 6/[q{q - if]), 

from which (i&t.l) follows. To show (t.2), we write 

X 

= -H-Yl VW{yi I X + u)Wiyi \x + d + u)\/T^(2/2 | x)W{y2 \ x + d) 

^ 2; yi,y2,u 
= ~ ■^(^{x.x+d})- Z{W^x^u,x+u+d}) 

= Zd{W)\ 

which implies Z^n^{W^) = ^max(W^)^, or equivalently T„+i — when Bn+i = +■ Similarly, one can bound Zd{W^) as 



= - E E -JY. ^(yi I ^ + ")W'(2/2 I u) J2 Wiyi \x + d + v)W{y2 \ v) 

X yi;V2 y M V 

<-J2J2 J2 -VWiyi I a; + u)Wiy2 \ u)W{yi \x + d + v)Wiy2 \ v) 



q ^ ^ q 
-E-EEVl^(yi \x + u)W[y^ \x + d + u) 

u X yi 



+ - E E Vw^(y2 1 u)w{y2 1 «)- E E v/w^(yi I ^ + ^Wiyi l^ + d + v) 

q ^ , ^ q ^ ^ 

u,v:u^v y2 X yi 

A^O ^ u y2 ^ X yi 

= 2Zd{W)+ J2 ZA{W)Zd+AiW) 
<2Zd{W) + {q-2)Z^,,{Wf. 



Thus we have Zn,ax(VK") < 2Z^^^{W) + (q - 2)Z^^{Wf < qZ^^„(W), which implies (t.3). Finally, (i&t.2) follows from 
© and the relation ^max(VF) < qZ{W). ■ 



B. Arbitrary input alphabet sizes 

The proof of Lemma |4l and hence of Theorem [1] depends critically on the assumption that g is a prime number, and does 
not extend trivially to the case of composite input alphabet sizes. In fact, it is possible to find channels that the transformation 
given in the previous section will not polarize: 

Example 1. Consider the quaternary-input channel W : {0, 1,2,3}^ {0, 1} defined by the transition probabilities W{Q \ 0)— 
W{Q I 2) = W{1 I 1) — W{1 I 3) = 1, with I{W) — log 2. IfW is combined/split using the transformation described in (fTTT l 
and ( 1121 ), where + denotes modulo-A addition, then the channels and are statistically equivalent to W. Therefore 

i{w-) = I{W) = I{W+). 
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For the general case, our first attempt at finding a polarizing transformation is to let 

Xl = Ui + U2 

where '+' denotes the group operation, and tt is a fixed permutation on X. In this case one can compute easily that 

To be able to mimic the proof of Proposition [T] one would want that Z{W^) = Z{W)'^. However, as the value of the inner 
sum above may depend on {x,x'), the equality Z{W^) — Z{WY will not necessarily hold in general. 

As we will see, however, the average value of the above Z{W~^) over all possible choices of tt is Z{W)'^. For this reason, 
it is appropriate to think of a randomized channel combining/splitting operation, where the randomness is over the choice of 
TT. To accomodate this randomness, again let {Ui, U2) denote the independent and uniformly distributed inputs, and let 11 be 
chosen uniformly at random from the set of permutations Vx, independently of {Ui, U2), and revealed to the receiver Set 

(Xi,X2) = ([/l + [/2,n(C/2)). (14) 

Observe that 

I{Ui,U2;Yi,Y2,U)=2I{W) 

= /(C/i ; Yi , F2 , n) + /([/2 ; Yi , Fa , C/i , n) , 

and that we may define the channels : X x Vx and : X ^ y"^ x X x Vx so that the terms on the right hand 

side equal I{W-) and 

W^"(2/l,y2,7I- I Ui) = V ^-rW^2(2/l,y2 I Wl,W2) (15) 

W^{yi,y2,Ui,TT I U2) = —^W2{yi,y2 I Ui,U2), (16) 
q ■ q] 

where W2{yi,y2 \ wi,U2) ^W{yi \ ui+U2)W{y2 \ 7r(u2)). 

Theorem 2. The transformation described in (I14l l. (I15l l, and (II6I 1 polarizes all discrete memoryless channels W in the sense 
of Proposition [7] 

Proof As in the binary case, we will let _Bi, B2^ ... be i.i.d., {+, — }-valued random variables taking the two values with 
equal probability, and define 

/„ :=/„(Si,...,B„) =/(W^^i-'^"), 
T„ :=T„(Bi,...,B„) -Z(VK^i--^"), 

with /q — I{W) and Tq = Z{W). We will prove the theorem by showing that the processes {/„} and {T!„} satisfy the 
conditions of Lemma |2] Since (i.l), (i.2), (t.l) are readily seen to hold, and (i&t. 1) is implied by inequalities (|8]l and ( flOl l in 
Proposition |3] we only need to show (t.2). To that end observe that 

^^^^^ ^ di^) ^ ^ ^ m{A^)M^')})\ E z{W{u+x,u+x'}). 

^ ' x,x':x^x' TT ^ u 

Note that for any x, x' the value of |f X^tt ^(W^{7r(i:),7r(a:')} ) is equal to Z{W), and for any u, the value of -^jj^ J2x,x' Z{W{u+x,u+x'}) 
also equals Z{W). Thus, Z{W+) = Z{Wf. ' ■ 

As Z{W) upper bounds the error probability of uncoded transmission (cf. Proposition |2]l, in order to bound the error 
probability of q-ary polar codes it suffices to show that the hypotheses of Lemma [3] hold. Since (i&t.2) is already implied 
by (|9]l, it remains to show (t.3): 

Proposition 4. For the transformation described in (1141) . ( 115b , and ( 116b , we have 

Z{W) < Z{W-) < min {qZ{W), 2Z{W) + {q - l)Z{Wf} . 
Proof is given in the Appendix. 

We have seen that choosing the transformation W 1-^ {W~ , W~^) in a random fashion from a set of transformations of size 
ql yields Z{W^) ~ Z{Wy', leading to channel polarization. In particular, for each W there is at least one transformation with 
Z(W'^) < Z{W)^. Therefore, randomness is needed only in order to find such transformations at code construction stage, 
and not for encoding/decoding. 
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In a channel polarization construction of size N, there are (2iV — 1) channels (W, W~, , W , W ^, etc.) in the 
recursion tree of code construction. For each channel W residing in any one of the {N — 1) internal nodes of this tree, we need 
to find a suitable permutation tt such that < Z{W)'^. Thus, the total complexity of finding the right permutations scales 

as ql{N — 1), in the worst case where all ql permutations are considered. Recall that polar code construction also requires 
determining the frozen coordinates, which is a task of complexity n{N) at best. So, the order of polar code construction 
complexity is not altered by the introduction of randomization. 



III. Complementary Remarks 

A. Reduction of randomness 

The transformation (ui,U2) i-^ (xi,X2) described above uses a random permutation to satisfy Z{W^) = Z{W)'^. This 
amount of randomness — over a set of size q\ — is in general not necessary, randomization over a set of size ((7 — 1)! is 
sufficient: 

Theorem 3. If the random permutation 11 that defines ( 1141 ) is chosen uniformly over the set of permutations for which is a 
fixed point, the resulting transformation yields Z{W^) = Z[W)'^ and thus is polarizing. 

A more significant reduction in randomness can be attained when the input alphabet X can be equipped with operations 
(+, •) to form an algebraic field — this is possible if and only if g is a prime power. A random variable taking only g — 1 
values is sufficient in this case. (We have already seen that no randomization is needed when q is prime.) To see this, pick R 
to be uniformly distributed from the non-zero elements A"* of X, reveal it to the receiver and set 

{xi,X2) = {Ui+U2,R- U2). (17) 

As was above we have 

21 {W) = /(C/i, C/2; Yi,Y2,R) = I{Ui-Yi,Y2,R) + /(C/2; Fi, ^2, Ui,R) = I{W-) + I(W+) 
provided that we define W- : X ^ y. X^ wA W+ : X ^ y. X y. X^ 

W~{yi,y2,r\ui) = \ V W {yi\ui + U2)W {y2\r ■ U2) , (18) 

W+{yi,y2,ui,r\u2) = . ^ ^. W{yi\ui + U2)W{y2\r ■ U2). (19) 

Theorem 4. The transformation described in (I17l l, ( II8I 1, and ( I19l l polarizes all q-ary input channels in the sense of Proposition\l\ 
provided that q is a prime power. 

Proof: Again, we only need to show that Z{W^) — Z{W)^. To that end observe that 

^ x,x': x^x' ^ r^a ^ u 

Writing x' — x + d, and u' = u + x, we can rewrite the above as 

^^^^^ " a^(a - 1)2 E E E Z{Wir.x.,r.x+r.,}) E ^( +.} ) 
^ ' d^Q x r=^Q u' 

Noting that for any fixed d, the sum ^^^^ Y.x,r^o Z{W{r-x.r-x+r-d}) equals Z{W), and that the sum -^^j-^ Y.u',d^o Z(W{u'y+d}) 
also equals Z{W) yields Z{W+) = Z{Wf. ' ' ■ 

When the field is of odd characteristic (i.e., when q is not a power of two), a further reduction is possible: since '}2u' Z{W[u' ,u'+d}) 
is invariant under d ^ —d, one can show that the range of R can be reduced from X^, to only half of the elements in X^,, by 
partition X^, into two equal parts in one-to-one correspondence via r 1-^ —r, and picking one of the parts as the range of R. 
It is easy to show that choosing R uniformly at random over this set of size {q — l)/2 will also yield Z{W'^) = Z{W)'^. 



B. A method to avoid randomness 

When the input alphabet size q is not prime, an alternative multi-level code construction technique can be used in order to 
avoid randomness: Consider a channel W with input alphabet size q = HiLi where g^'s are the prime factors of q. When 
the input X to is uniformly distributed on X, one can write X = {Ui, . . . , Ul), where C/^'s are independent and uniformly 
distributed on their respective ranges Ui = {0, . . . , — 1}. Defining the channels VF^*-' : Ui y x Ui x . . . x Ut-i through 

I u.) ^Y[q-' E 
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it is easily seen that 

I{W) = I{X- Y) I{Ut;Y) = J2 m-.y. Ur') = ^ /(T^W). 

i i 

Having decomposed W into W'^^\ . . . ^W'^'"\ one can polarize each channel PF^*^ separately. The order of successive 
cancellation decoding in this multi-level construction is to first decode all channels derived from W'^^\ then all channels 
derived from VK^^^, and so on. Since the input alphabet size of each channel is prime, no randomization is needed. 

C. Equidistant channels 

A channel W is said to be equidistant if Z{W^x^/j) is constant for all pair of distinct input letters x and x'. These are 
channels with a high degree of symmetry. In particular, if a channel W is equidistant, then so are the channels and W~ 
created by the deterministic mapping {ui,U2) i— > (mi +M2,W2)- By similar arguments to those in Section Hl-AI it follows that 
this mapping polarizes equidistant channels, regardless of the input alphabet size. 

D. How to achieve channel capacity using polar codes 

In all of the above, the input letters of the channel under consideration were used with equal frequency. This was sufficient 
to achieve the symmetric channel capacity. However, in order to achieve the true channel capacity, one should be able to use 
the channel inputs with non-uniform frequencies in general. The following method, discussed in ||3] p. 208], shows how to 
implement non-uniform input distributions within the polar coding framework. 

Given two finite sets X and X' with rn ~ \X'\, any distribution Px on X for which mPx{x) is an integer for all x can be 
induced by the uniform distribution on X' and a deterministic map f : X' ^ X. 

Given a channel W : X ^ y, and a distribution Px as above, we can construct the channel W' : X' ^ y whose input 
alphabet is X' and W'{y\x') = W{y\f{x')). Then I{W') is the same as the mutual information developed between the input 
and output of the channel W when the input distribution is Px- Consequently, a method that achieves the symmetric capacity 
of any discrete memoryless channel, such as the channel polarization method considered in this paper, can be extended to 
approach the true capacity of any discrete memoryless channel by taking Px as a rational distribution approximating the 
capacity achieving distribution. (In order to avoid randomization, one may use prime m in the constructions.) 

E. Channels with continuous alphabets 

Although the discussion above has been restricted to channels with discrete input and output alphabets, it should be clear 
that the results hold when the output alphabet is continuous, with minor notational changes. In the more interesting case of 
channels with continuous input alphabets — possibly with input constraints, such as the additive Gaussian noise channel with 
an input power constraint — we may readily apply the method of Section IIII-DI to approximate any desired continuous input 
distribution for the target channel, and thereby approach its capacity using polar codes. 
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Appendix 

A. Proof of Proposition \3\ 

This proposition was proved in IT] for the binary case q — 2. Here, we will reduce the general case to the binary case. 

1) Proof of The right hand side (r.h.s.) of dHJ equals the channel parameter known as symmetric cutoff rate. More 
specifically, it equals the function Eq{1, Q) defined in Gallager |3, Section 5.6] with Q taken as the uniform input distribution. 
It is well known (and shown in the same section of [3]) that the cutoff rate cannot be greater than I{W). This completes the 
proof of dSl). 
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2) Proof of dD; 
Lemma 5. For any q-ary channel W : X ^ y, 

I{W) < log(g/2) 



(20) 



Proof: Let {X,Y, Xi, X2) ~ (5(a;)P(a;i, a::2|a;)VF(2/|a;) where Q is the uniform distribution on X and 



P{xi,X2\x) 



if xi = X and X2 7^ a;i 
if 2:2 = X and xi ^ X2 



2(9-1) 
1 

2(9-1) 

otherwise 



Clearly we have I{W) = I{X; Y) < I{X; Y, Xi, X2). By the chain rule, I{X- Y, Xi, X2) = I{X; Xi, X2) X2). 
Now, simple calculations show that 1{X\X\,X2) and /(X;y|Xi,X2) equal the two terms that appear on the right side of 
(|20] l. (Intuitively, (X, F) are the input and output of W and {X\^X2) is a side information of value log(g/2) supplied by a 
genie to the receiver) ■ 
Note that the summation in ( l20l i can be written as the expectation E [-?^(W^{Xi,X2})] where (Xi, X2) ranges over all distinct 
pairs of letters from X with equal probability. Next, use the form of (|9]l for g = 2 (which is already established in jT)) to write 
E [^(W^{jfi,A'2})] 5: log(2) E [-^1 — Z{W^Xi,X2})^ ■ Use Jensen's inequality on the function — a;^, which is concave for 
< X < 1, to obtain E [y/l - Z{W{x^,X2})^] < ' ^i^i^{XuX2}W- Since Z{W) = E[Z{W{XuX2})l this completes 



the proof of Q. 

3) Proof of ([Tol l; For notational simplicity we will let Wx{- 



W{- I x). First note that 



^ xeAT \ ^ x' I 



D 



where -D(-|| ) is the Kullback-Leibler divergence. Each term in the above summation can be bounded as 

, y^xyyi^^^B^ 

V 

<\ogeY,Wx{y) 



\ x' 



'^x{v)-'\Y.x'^Av)^ 



< g log e ^ 



gloge 



T^x(2/)--E^-'(2/) 



(21) 



In the above, the first inequality follows from the relation ln(a;) < a: — 1, and the second inequality is due to W^i^y) < 
X^x' ^^x'iy)- The L\ distance on the right hand side of ( 1211 1 can be bounded, using the triangle inequality, as 



VF,--E^-' <-Eii^- 



2:' 111 



Also, it was shown in |T, Lemma 3] that 



Combining the inequalities above, we obtain 



2 lege 



< 2{q - l)loge^l - Z{Wf, 



where the last step follows from the concavity of the function x Vl — a;^ for < a; < 1. 
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B. Proof of Proposition |4] 

Define the channel W^^"^ through 

and let 



M^^""Hyiy2 I x) = W{yi I X + u)W{y2 \ iriu)). 



(iru) 



Note if one fixes the permutation in the transformation W t-^ (W , W~^) to tt, then W — W'''^\ 
We will show the stronger result that 

Z{W) < Z(T^(^)) < mm{qZ{W),2Z{W) + {q - l)Z{W)^} 

for all TT, which will imply Proposition |4] since Z{W^) — Z{W'''"^). To prove the upper bound on Z{W'^'^^), we write 



x,x'£X vi,y2&y \ uex 



< 



^ E E - E ^^(2^2 I 7r{u))Wiyi \x + u)J2 VWiy2 \ 7r{v))Wiy, \ x' + v) 



q{q- I) ^ q 

x,x' VI, V2 It 
x^x 



-E-r^ E E ^(y^ I vr(u)) ^ ^W{y, I a; + \ x' + u) 



(22) 



-YTZrjS E E v/W^(2/2 I 7T{u))W{y2 I ^(«)) 5] ^ ^Wiyi\x + u)W{yi\x' + v). (23) 



x^x' 



Note that 



^ W^(y2 I TTiu)) J2 VW{yi \x + u)W{yi \ x' + u) — Z{W{ 

x+u.x' ~\-u} } 

1/2 yi 

for any u G X. Therefore the r.h.s. of ( |22] | is equal to Z{W). Also, note that the innermost sum over yi in ( l23T l is upper 
bounded by 1. Therefore, ( l23T l is upper bounded by (q — 1)Z{W). Alternatively, noting that for any fixed u ^ v 



E E ^^ivi I 2: + ^Wiyi \x' + v) = q + 

x.x yi 

x^x 



we have 



E E ^^(2/1 I a; + I x' + «) 

c : x^x yi 
-u^x'-{~v 

<q + qiq-l)Z{W), 
rh.s. Of (|23 < (1 + (-7 - l)^(W-)) ^_ ^ ^ VW^(2/2 I u)W{y2 \ v) 

/ U.V I/O 



= Z{W) + {q-l)Z{Wf 



This in turn implies Z(T4^('')) < min {qZ{W),2Z{W) + {q- l)Z{Wf]. 

The proof of Z{W) < Z{W'^'''>) follows from the concavity of Z{W{^_^,}) in W, shown in H]: 



x^x' 

^ ' x^x' u 

^^E^(^^E E VV^(yi I X + «))W^(2/i I X' + u)W{y2 I ^(w))W^(2/2 | H^)) 



x^x' yi,y2 

-E~? TT E 

q'^ q[q - 1) "V, 

ti ^ ' x^x' 

Z{W). 
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